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SUMMARY 

The memory subsystem hierarchical disk- 
cache (MESSIAH) discussed in this paper aims 
at realization of high-speed access to sec- 
ondary memory. MESSIAH uses a large-capacity 
buffer memory (A) for the input-output device 
and a small-capacity buffer (B) for each 
disk-device. (A) realizes the traditional 
disk-cache function. (B) realizes reduction 
of the disk-rotations waiting time due to a 
miss in RPS (rotational position sensing) , 
reduction of overlap between disk write-in 
and search time, and immediate delivery of 
look-ahead data. In other words, in MESSIAH 
(A) realizes a cache memory of sufficient 
capacity and (B) copes with RPS misses, 
which increase with size of the transfer 
block between disk and (A) . (B) also real- 
izes high-speed write-in and read-out of 
the disk. Thus, MESSIAH provides greater 
advantage than the multiple effects of disk- 
cache function and the B-disk (buffer-in- 
stalled disk device) . This paper proposes 
an architecture for MESSIAH and verifies 
its usefulness by simulation. 

1. Introduction 

With recent advances in the processing 
function of the CPU in computer systems bot- 
tlenecks increasingly occur in input-output 
processing. This tendency will further be 
enhanced in the future. In such a situa- 
tion high-speed access to the disk becomes 
one of the most important problems. There 
are various means for high-speed access to 
the disk, such as addition of disks and 
channels, modification of the blocking 
factors, introduction of disk-cache [2, 3], 



expansion of main memory, replacement of 
memory by high-speed devices like the CCD 
and magnetic bubble. Another method of the 
improvement is the buffer-installed disk- 
device (B-disk) proposed by the authors [5]. 
B-disk has the following three features: 
(1) the waiting time due to an RPS miss [1] 
is reduced (A); (2) in write-in, the seek 
operation and data transfer overlap which 
apparently eliminates the seek time and the 
rotation waiting time (B) ; (3) in readout, 
if look-ahead data exist in the data buffer, 
the rotation-waiting time is eliminated (C) . 

The authors have proposed a B-disk for 
high-speed input-output processing in order 
to match the drastic improvement in CPU 
processing ability. This paper further pro- 
poses a memory subsystem for hierarchical 
disk-cache (MESSIAH) in order to realize 
high-speed input-output processing combin- 
ing B-disk and disk cache (DC) with the mul- 
tiplying effect. Performance evaluation of 
DC and MESSIAH is made for the system work- 
ing at an actual site. 



2. Hierarchical Disk-Cache Subsystem 

2.1 Reduction of input-output process- 
ing time 

The disk cache (DC) is a direct method 
of reducing the input-output processing time. 
First, we explain the mechanism whereby the 
DC can improve the input-output processing 
time. The DC control scheme is also de- 
scribed, the DC improves the input-output 
processing time by virtue of the following 
fact, considering that the disk is a rotat- 
ing medium: (1) the seek time is improved 
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by reducing the number of arm shifts in the 
disk device (D) ; (2) the rotation waiting 
time is reduced by reducing the number of 
disk rotation waits (E) ; (3) on the other 
hand, data transfer between the disk and DC 
is increased, involving an increase of 
data transfer time (F) . 

Thus, in order to realize the full 
effectiveness of DC, the following relation 
should be satisfied with a sufficient margin: 

(effect of D) + (effect of E) 

>> (effect of F) (D 

A number of schemes has been pro- 
posed for DC control (e.g., write-through, 
write-after, bypass mode, sequential access 
mode [2] and a mode using DC only [2]. As 
is seen from the principle of improvement 
of the input-output processing time, the 
merit of B-disk (C) is the same as that (E) 
of the readout from DC. It is also seen 
that the input-output processing time can 
be improved with respect to each of the 
above features. 

It should be noted that, even when 
the environment does not permit full 



realization of the B-disk advantages, the 
performance of the B-disk is not degraded as 
much as in other magnetic disk devices. 

2.2 Structure and operation of buffer- 
installed disk device 

B-disk is one of the important elements 
in MESSIAH. The B-disk structure and opera- 
tion are described below. The RPS function 
of the disk device improves the utilization 
efficiency of the disk controller (DKC) and 
reduces the rotation waiting time. On the 
other hand, with an increase in busy ratio 
of DKC the RPS misses increase, increasing 
the excess rotation waiting time. B-disk is 
a device installed in the buffer to minimize 
the above waiting time. 

Figure 1 shows the detailed structure 
of the B-disk. In this figure, RDB is the 
read buffer and WDB is the write buffer. 
The buffer controller (BC) controls RDB and 
WDB. RDB stores three tracks of data, which 
are the seek track and two adjacent tracks. 
It stores the data on the tracks independ- 
ently of request from the input-output con- 
trol device (IOC). WDB is a FIFO memory 
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which stores the data to be written into the 
disk. It receives the data from DKC with- 
out waiting for the end of the seek opera- 
tion and the rotation wait. When the seek 
and rotation wait operations are completed, 
data transfer from WBD to the disk is 
started. 

By virtue of the above functions data 
transfer between the disk and DKC need not 
be synchronized to the disk rotation, as in 
the past, which reduces the waiting time due 
to RPS misses. WDB transfers the data to be 
written in parallel to the seek operation 
during the write period, which eliminates 
the seek time and the rotation waiting time 
((B) and (C) in Sect. 1) . 

2.3 Structure and operation of MESSIAH 

Here we. describe the memory subsystem 
of the hierarchical disk cache (MESSIAH) 
which can realize the merits of both B-disk 
and DC by a hierarchical arrangement of 
the two schemes. Figure 2 shows the struc- 
ture of MESSIAH, wherein the ordinary mag- 
netic disk is replaced by B-disk and DC is 
installed in the input-output processing de- 
vice. The reasons for placing DC in the 
input-output devices are: (1) no load is 
further placed on CPU and (2) as many B-disks 
as possible can share DC, increasing effec- 
tively the memory capacity of DC. 

As is seen in the structure of MESSIAH 
high-speed input-output processing is real- 
ized by adding a small-capacity buffer mem- 
ory to the magnetic disk device and a large- 
capacity buffer memory to the input-output 
device of the ordinary computer system. 

The reason for the high-speed input- 
output processing of MESSIAH is, in addi- 
tion to the simultaneous realization of the 
merits of B-disk and DC, to the remedy of 
the DC shortcoming described in (F) . In 
general, in order to improve the hit ratio 
of DC, as much data as possible should be 
read ahead. This, however, increases the 
data transfer and size of the transfer block 
between B-disk and DC, leading to more fre- 
quent RPS misses. On the other hand, B- 
disk can retain the object data in the buf- 
fer memory, which eliminates the need for 
the disk-rotation waiting in the event of 
RPS miss. For this reason, B-disk can pre- 
vent performance degradation due to frequent 
RPS misses, which has been a serious problem 
in DC. In this sense, MESSIAH realizes the 
multiplying effects of B-disk and DC. 

Thus, the merits of MESSIAH can be 
summarized in the following six points. 

(1) The seek time is reduced by re- 
ducing the number of arm shifts in the disk 
device (B-disk) . 
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Fig. 2. Configuration of MESSIAH. 



(2) The rotation waiting time is re- 
duced by reducing the number of disk-rota- 
tion waits. 

(3) The waiting time in RPS misses is 
reduced . 

(4) In writing, the seek operation 
and data transfer overlaps, which eliminates 
the seek time and the rotation waiting time 
for the disk device (B-disk) . 

(5) In reading, if the requested data 
exist in the B-disk buffer, the rotation 
waiting time is eliminated. 

(6) The system can cope with frequent 
RPS misses due to increased block size in 
the data transfer between B-disk and DC. 



3. Method of Performance Evaluation 

Performance evaluation was made for 
the computer system in actual operation 
(called DB 16 in this paper). DC 16 is a 
system with 300-MB disk, 16 spindles with 
optimal file location (system disk: 2 
spindles and private disk: 14 spindles) . 
In DB 16 the database processing and file- 
batch processing jobs are executed in 8- 
tuple jobs. A CODASYL type database is used. 

3.1 Flow of performance evaluation 

The performance was evaluated by the 
following procedure (Fig. 3). 

(1) From statistical data of the ac- 
tual system (e.g., I/O frequency) the anal- 
ysis period is selected. The disk access 
pattern of the trace data is analyzed. The 
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Fig. 3. Procedure of performance evaluation for 
MESSIAH. 



The trace data are extracted and edited for 
DC performance evaluation. 

(2) Performance of DC is evaluated. 

(3) From the result of performance 
evaluation of DC, the trace data are selected 
for performance evaluation of MESSIAH and DC 
considering the RPS misses. 

(4) The trace data are edited for 
performance evaluation of MESSIAH and DC 
considering the RPS misses and the request 
rate of I/O is calculated. 

(5) The performance evaluation is 
made for MESSIAH and DC considering RPS 
misses. 

3.2 Simulator for disk cache 

The DC simulator uses the trace data 
of the disk access and evaluates the DC per- 
formance by calculating the cache hit ratio, 
improvement of access time and hit-depth 
pattern. The access time is calculated con- 
sidering the seek length of the disk device. 
In the simulator, DC is placed in the input- 
output processing device. 

A feature of the simulator is that 4 
kinds of block sizes (1 page = 2 KB, 1 
track, 2 tracks and 1 cylinder) can be 
selected in the data transfer between DC 
and. magnetic disk. The unit of data manage- 
ment in DC is the same as the transfer- 
block size, in principle. The exception is 
the case where the transfer-block size is 2 
tracks, where the unit of management is 1 
track. By varying the block size of the 
data transfer the effect of prefetch of the 



physical address as affected by the disk 
address can be examined. 

The control scheme for DC employs the 
write-through scheme, which possesses high 
reliability in file updating. The write- 
through scheme employed in this system is 
described below. 

Read: (i) If the object data are in 
DC (hit), the data are transferred from DC 
to main memory) . 

(ii) If the object data are not in DC, 
the data are transferred from the disk to 
both DC and main memory. 

Write: (i) When hit, the correspond- 
ing record and disk are updated. 

(ii) When not hit, only the disk is 
updated. 

Cache replacement is performed by LRU 
scheme. Since the I/O request in the object 
system was less than 2 KB, it is defined that 
hit occurs when the page requested by I/O 
exists in DC. The input-output service time 
(T c ) is defined as follows. 

When hit: T c = mean DC access time 

When not hit: T c = (seek time) + (mean 
rotation waiting time) + (data 
transfer time) 

3.3 Simulation of MESSIAH 

MESSIAH is simulated using a DC simula- 
tor, B-disk simulator and an I/O request edit 
program which simulates DC operation to pro- 
duce the request from the input-output pro- 
cessing to B-disk. 
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3.4 Simulator for buffer-installed 
disk device 

The B-disk simulator is composed of 
two simulators, with and without buffer. It 
can evaluate the B-disk performance under 
various traffic conditions. As the simula- 
tor for the disk without buffer, Mutsubishi 
M2838-F with 300-MB disk, is used as the 
model. The B-disk simulator employs M2838-F 
plus a buffer. Each of the simulators oper- 
ates assuming a system composed of a CPU, a 
DKC and 8 disks. The IOP (I/O processor) 
has little effect on the performance and is 
not included in the model. 



4. Result of Performance Evaluation 

4.1 Performance evaluation of disk 
cache 

The DC performance evaluation is made 
for the following 8 access patterns. 

( 1 ) Database access (A, B, L, 2, 4) 

(2) Database and file access (C, T) 

(3) File access (D) 

As the database (DB) a general CODASYL-type 
DB management system is considered, which 
realizes high-speed retrieval using a hash 
function. 

The data used in (1) - (3) were 
selected from large-scale typical or gen- 
eral data. B is an access pattern almost 
equal to a sequential search; A, L, 2 and 4 
are random-access patterns often observed 
in database access. For example, in the 
access pattern of A the maximum and mean 
seek length are 692 and 49.2, respectively. 
C and T are access patterns in which the 
access is made to the database and to the 
file nearly the same number of times. D is 
a sequential access pattern. 

Evaluation of cache hit ratio 

Figures 4-7 show the DC hit ratio 
when the transfer block sizes are 1 page, 1 
track, 2 track and 1 cylinder, respectively. 
The cache size is represented on a per- 
spindle basis. The maximum cache size is 
set as 12-MB/spindle based on the following 
analysis. In DC the probability that a hit 
is produced below depth 30 of the LRU stack 
is above 81% (88% in most cases). When the 
transfer block size is 1 cylinder (maximum) , 
cache size of 322 KB * 30 = 9.66 MB is re- 
quired in order to realize an LRU stack of 
depth 30. 

The following properties are observed 
concerning the hit ratios of Figs. 4-7. 
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(1) The hit ratio saturates in most 
cases at cache size of 1-MB/spindle . 

(2) The hit ratio increases with the 
transfer block size in the order of 2 tracks, 
1 track and 1 page, indicating that prefetch 
of a neighboring disk address is effective. 
When the transfer block size is 1 cylinder, 
however, 1 - 2-MB/spindle is not sufficient 
for database access, resulting in a poor hit 
ratio . 

Input-output service time (T^) 

Table 1 shows the input-output service 
time (Tc) utilizing DC for certain boundary 
values. The following observations are made 
concerning T 0 for access patterns, A, B, C 
and D. The input-output processing speed is 
improved by the following features: A : 1.61 
times, B : 1.43 times, C : 204 times, D : 2.38 
times. The transfer block sizes are A : 1 
page; B : 2 tracks; C : 2 tracks; D : 1 cyl- 
inder. For each access pattern the following 
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Table 1. Reduction of input-outpuc service time 
{To) by DC 
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(Note) Tc is reduced to less than 0.5 times in @, 
0.5 to 0.6 times in Q» 0.6 to 0.8 times in A, and 
is above 0.8 times in *. 



properties are seen. In A (DB) , Tc is re- 
duced to 0.6 - 0.7 times, except for the 
case of the transfer block size of 1 cyl- 
inder. In B (DB), C (DB and file) and D 
(file), Tc is reduced to 0.7 - 0.8, 0.5 
- 0.6 and 0.4 - 0.6 times, respectively. 

It is seen from the above result that 
T Q is reduced by 0.1 - 0.3 more in file 
access than in data access. Concerning the 
database access, DC does not work effec- 
tively when the transfer block size is large 
(1 cylinder) in A and small (1 page) in B. 
The reason for this is that the access pat- 
tern of A is random and that of B is partly 
sequential. Thus, it is concluded that the 
access pattern is important when DC is em- 
ployed in the database access. 

Comparing the cases with transfer 
block size of 1 track and 2 tracks, the 
latter is seen to reduce the access time 
by 1 to 12%. This indicates that data 
fetch by the disk address is somewhat effec- 
tive. The effect of prefetch is the most 
remarkable in D. The reason for that is that 
D has an almost sequential access pattern. 

Input-output service time (2V) consid- 
ering RPS misses. 

Tc performance evaluation discussed up 
to this point does not take into considera- 
tion the RPS misses produced by racing in 
the disk controller (DKC) by the magnetic 
disk devices. Consequently, as the next 
step a simulation was performed considering 
RPS misses. Table 2 shows the input-output 
service time for access patterns A and D as 
obtained by simulation and give the results 




Cache size (MB) 

Fig. 7. Hit ratio (transfer block size 
cylinder) . 
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Table 2. Input service time (7» considering RPS misses 



Process- 
ing 


DC 


DC size (MB) 


B S 


RPS miss 
ratio (%) 


Dusy 
ratio (%) 


Mean Tr (ms) 


A 


Without 




max . 1 P 


2 2. 1 - 2 4. 1 


2 5. 4 


2 7. 1 


D 


Without 




max . 1 P 


2 8.7 — 3 0.9 


2 5. 8 


2 9.5 , 


A 


With 


0. 2 5 


1 P 


1 0.3 - 1 3.9 


1 5.0 


1 5.9 


1 2. 0 0 


1 1.0- 1 2.3 


1 3. 8 


1 4.7 


0. 2 5 


1 T 


3 8.4-39.7 


4 7. 7 


2 3. 0 


1 2. 0 0 


2 9.7 — 3 1.2 


3 5. 0 


1 8. 1 


0. 2 5 


2 T 


6 2.6 — 6 6.1 


7 1. 4 


3 5. 9 


1 2. 0 0 


4 7.2 - 5 1.8 


5 2. 3 


2 3. 6 


D 


With 


1. 0 0 


1 P 


2 8.7 — 3 0.9 


2 5. 8 


2 9. 5 


1. 0 0 


1 T 


2 4.2 - 2 6.3 


2 6. 7 


1 5. 3 


1. 0 0 


2 T 


2 5.3 — 2 7.0 


2 7. 2 


1 4. 3 


1. 0 0 


1 C 


.3 3.6 - 3 4.9 


2 9. 2 


1 4. 6 



(Note 1) BS transfer block size, P: page, T: track, C: cylinder. 
(Note 2) Processing implies the access pattern. 



Table 3. Reduction ratio of Tr by MESSIAH 



Process- 
ing 


Use of 
MESSIAH 


DC size (MB) 


B S 


Mean Tr (ms) 


Mean ratio 
of T r 


A 


Without 




max.l P 


2 7. 1 


1. 0 


D 


Without 




max.l P 


2 9. 5 


1. 0 


A 


With 


0.2 5 


1 P 


9. 0 


0. 3 3 


1 2. 0 0 


7. 5 


0. 2 8 


0. 2 5 


1 T 


1 2. 5 


0. 4 6 


1 2. 0 0 


9. 1 


0. 3 4 


0. 2 5 


2 T 


2 5. 6 


0. 9 4 


1 2. 0 0 


1 3.2 


0. 4 9 


D 


With 


1. 0 0 


1 P 


1 3. 4 


0. 4 5 


1. 0 0 


1 T 


6. 5 


0. 2 2 


LOO 


2 T 


5. 8 


0. 2 0 


1. 0 0 


1 C 


2. 9 


0. 0 9 8 



(Note 1) BS: transfer block size, P: page, T: Track, C: cylinder. 
(Note 2) Processing indicates the access pattern. 



for {T r - T c + (RPS miss handling time)), 
DKC busy ratio and RPS miss. 

It is seen from these results that DKC 
is a bottleneck of the system in A for 
transfer block size of 2 tracks. The rea- 
son for this is as follows. The threshold 
values for RPS misses and DKC busy ratio 



when DKC is the bottleneck of the system 
are both 30 - 40%. The situation in the 
simulation actually exceeds this value. The 
same situation is anticipated for transfer 
block size of 2 tracks. Thus, in the stand- 
alone application of DC in DB processing, the 
designer should be careful that DKC is not a 
bottleneck by referring to this result. 
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Table 3 shows the reduction ratio of 
the input-output service time (T r ) with in- 
troduction of MESSIAH. A is the random 
access pattern often observed in DB access 
an d D is a sequential access pattern. It is 
seen that by using MESSIAH T r can be reduced 
to 0 94 - 0.28 and 0.45 - 0.098 times in A 
and D, respectively. In terms of process- 
ing speed, the factors are 1.1 - 3.6 and 2.2 
- 10.2 times in A and D, respectively. 

It is seen from Table 2 and B-disk 
simulation that Tr can be reduced by using 
DC by 1.3 - 0.54 and 1.0 - 0.48 times in A 
and D, respectively. By using B-disk T r can 
be reduced to 0.62 and 0.45 times in A and 
D, respectively. As a result of evaluation 
it is noted that the following relation is 
derived: 

(speed improvement by MESSIAH) 
> (speed improvement by B-disk) 
x (speed improvement by DC) 

ion (2) indicates the multiplying effect 
ved by the hierarchically combined 
ions of B-disk and DC. This advantage 

to be due to reduction of the RPS miss 
(see 2.3) upon increased transfer block 
(or transfer data) between DC and mag- 
disk device (B-disk) . 



(2) 
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The following merits were also veri- 
fied by simulation. The write time in the 
disk is reduced by using B-disk. The read 
time is reduced for the previously accessed 
track and adjacent tracks. The read time is 
reduced by the large-capacity DC, which has 
been difficult to achieve with the ordinary 
small-capacity B-disk. The input-output pro- 
cessing speed is improved by decreasing the 
number of accesses to B-disk. From these 
results it is concluded that the MESSIAH 
structure is very useful. The size of buf- 
fer memory needed in MESSIAH is several tens 
KB/spindle in B-disk, and approximately 1- 
MB/spindle in DC. 

Further speed improvement by MESSIAH 

This paper evaluated the performance 
of a scheme whereby DC in MESSIAH is con- 
trolled by a write-through scheme, result- 
ing in observation of the following two 
points . 

(1) For access patterns such as A, 
L and 2, which are often observed in DB 
access, the WRITE instruction, which accesses 
the partial write-in of data transferred to 
DC by READ instruction, occupies 80 - 89% 

of the whole WRITE instruction. 

(2) For access patterns such as D, 
which makes almost sequential access to the 
file, the probability that WRITE instruction 



for B-disk accesses a track accessed by the 
previous READ or WRITE instruction is 57.4 

- 67.7% of the total READ and WRITE instruc- 
tions. The probability that the same track 
or an adjacent track is accessed is 81.7 

- 97.4% of the total READ and WRITE instruc- 
tions. 

The following observations are made 
from the above two points. It is seen from 
(1) that further improvement of the input- 
output processing speed can be made for such 
access patterns as A, which is often seen in 
DB access, by controlling the MESSIAH DC by 
a write-after scheme. It is seen from (2) 
that further improvement of the speed can be 
made for such access patterns as D, which 
makes an almost sequential access to the 
file by controlling the MESSIAH DC by a 
write-after scheme. 

Thus, further improvement of the input- 
output processing speed can be made by con- 
trolling the MESSIAH DC by a write-after 
scheme . 



5 . Conclusion 

This paper proposed a memory subsystem 
of hierarchical disk-cache (MESSIAH) which is a 
hierarchical combination of the functions of 
B-disk and DC. Performance evaluation was 
made by simulation for typical or general 
data selected from the trace data of data- 
base and file processing obtained from an 
actual operating system. Performance evalu- 
ations of B-disk and DC were made in order 
to examine the performance of MESSIAH. 

As a result, it was seen that by in- 
troducing MESSIAH with write-through control 
of DC the speed of input-output processing 
can be improved up to 3.6 ~ 10,2 times in 
database processing (A) and file processing 
(D), respectively. The relation 

(effect of MESSIAH) > (effect of 

B-disk) x (effect of DC) 
was established, thus confirming by simula- 
tion that MESSIAH realizes the multiplying 
effect of B-disk and DC. 

The size of buffer memory needed in 
MESSIAH in order to improve the input-output 
processing speed is several tens KB/spindle 
for B-disk and approximately 1-MB/spmdle 
for DC, which are sufficiently practical 
values. The performance of MESSIAH was 
evaluated in this paper with the DC con- 
trolled by a write-through scheme. Further 
possibility of speed movement for input- 
output processing was investigated by adopt- 
ing a write-after control scheme for DC The 
detailed evaluation of this scheme is left 
for further study. 
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